5 research outputs found
Learning Unsupervised Hierarchies of Audio Concepts
Music signals are difficult to interpret from their low-level features,
perhaps even more than images: e.g. highlighting part of a spectrogram or an
image is often insufficient to convey high-level ideas that are genuinely
relevant to humans. In computer vision, concept learning was therein proposed
to adjust explanations to the right abstraction level (e.g. detect clinical
concepts from radiographs). These methods have yet to be used for MIR.
In this paper, we adapt concept learning to the realm of music, with its
particularities. For instance, music concepts are typically non-independent and
of mixed nature (e.g. genre, instruments, mood), unlike previous work that
assumed disentangled concepts. We propose a method to learn numerous music
concepts from audio and then automatically hierarchise them to expose their
mutual relationships. We conduct experiments on datasets of playlists from a
music streaming service, serving as a few annotated examples for diverse
concepts. Evaluations show that the mined hierarchies are aligned with both
ground-truth hierarchies of concepts -- when available -- and with proxy
sources of concept similarity in the general case.Comment: ISMIR 202
Of Spiky SVDs and Music Recommendation
The truncated singular value decomposition is a widely used methodology in
music recommendation for direct similar-item retrieval or embedding musical
items for downstream tasks. This paper investigates a curious effect that we
show naturally occurring on many recommendation datasets: spiking formations in
the embedding space. We first propose a metric to quantify this spiking
organization's strength, then mathematically prove its origin tied to
underlying communities of items of varying internal popularity. With this
new-found theoretical understanding, we finally open the topic with an
industrial use case of estimating how music embeddings' top-k similar items
will change over time under the addition of data.Comment: Accepted for RecSys 2023 (Singapour, 18-22 September
Explainability in Music Recommender Systems
The most common way to listen to recorded music nowadays is via streaming
platforms which provide access to tens of millions of tracks. To assist users
in effectively browsing these large catalogs, the integration of Music
Recommender Systems (MRSs) has become essential. Current real-world MRSs are
often quite complex and optimized for recommendation accuracy. They combine
several building blocks based on collaborative filtering and content-based
recommendation. This complexity can hinder the ability to explain
recommendations to end users, which is particularly important for
recommendations perceived as unexpected or inappropriate. While pure
recommendation performance often correlates with user satisfaction,
explainability has a positive impact on other factors such as trust and
forgiveness, which are ultimately essential to maintain user loyalty.
In this article, we discuss how explainability can be addressed in the
context of MRSs. We provide perspectives on how explainability could improve
music recommendation algorithms and enhance user experience. First, we review
common dimensions and goals of recommenders' explainability and in general of
eXplainable Artificial Intelligence (XAI), and elaborate on the extent to which
these apply -- or need to be adapted -- to the specific characteristics of
music consumption and recommendation. Then, we show how explainability
components can be integrated within a MRS and in what form explanations can be
provided. Since the evaluation of explanation quality is decoupled from pure
accuracy-based evaluation criteria, we also discuss requirements and strategies
for evaluating explanations of music recommendations. Finally, we describe the
current challenges for introducing explainability within a large-scale
industrial music recommender system and provide research perspectives.Comment: To appear in AI Magazine, Special Topic on Recommender Systems 202
MesoNet: a Compact Facial Video Forgery Detection Network
This paper presents a method to automatically and efficiently detect face tampering in videos, and particularly focuses on two recent techniques used to generate hyper-realistic forged videos: Deepfake and Face2Face. Traditional image forensics techniques are usually not well suited to videos due to the compression that strongly degrades the data. Thus, this paper follows a deep learning approach and presents two networks, both with a low number of layers to focus on the mesoscopic properties of images. We evaluate those fast networks on both an existing dataset and a dataset we have constituted from online videos. The tests demonstrate a very successful detection rate with more than 98% for Deepfake and 95% for Face2Face